[AI 影像處理 30天] [Day 07] 用 DeepLSD 劃分場景結構：以檢測到的線段區分垂直線與水平線！

2024 iThome 鐵人賽

DAY 7

AI/ ML & Data

AI 影像處理 30天系列第 7 篇

[AI 影像處理 30天] [Day 07] 用 DeepLSD 劃分場景結構：以檢測到的線段區分垂直線與水平線！

16th鐵人賽

twm_pt_dat

2024-09-19 09:23:21

153 瀏覽

分享至

DeepLSD 簡介

DeepLSD 是個通用的線段檢測器，結合了深度學習的穩健性和手工設計檢測器的精確性。此模型可從各種圖片中提取出結構性線段，但沒有垂直線與水平線的分類。若能將那些線段分成水平線與垂直線，便可推估場景的透視關係，在將物件影像合成至場景影像時會帶來很大的幫助。因此本篇就來探討如何應用該套件獲得場景圖像的垂直線與水平線!

範例用圖

下圖為待會會用來展示結果的三張範例圖，為了溝通方便，由左至右我們分別稱它們為 kitchen.png, recording_room.jpg, 以及 your_logo_here.jpg
eg3pic

參數設定

在 DeepLSD 的 config 中，有個叫做 grad_thresh 的參數會影響最後輸出的線段平均長度、線段總數，以及每張圖片計算所需時間:
avg_length

上圖為不同圖片設定不同 grad_thresh 的參數所輸出的線段平均長度，下圖為不同圖片設定不同 grad_thresh 的參數所輸出的線段總數和三張圖片計算所需時間。
performance_timecost

從那兩張圖中我們可以看出當 grad_thresh 為 2 時的線段總數相對少且平均線段長度相對長 aka 線段最完整不零碎，同時亦擁有不會過長的圖片計算所需時間，因此後續展示皆以此值作為 grad_thresh 的設定 (用此設定對三張範例圖跑 DeepLSD 模型的結果如下)。
grad_thresh2

DeepLSD 的詳細教學在它的 Github 上都有，故不在此贅述。

區分水平與垂直

拿到 DeepLSD 的輸出結果後可以利用每條線段的端點座標差求出斜率，並依照斜率區分水平與垂直:

def classify_lines_with_slope(out: dict) -> Tuple[NDArray[np.float32], NDArray[np.float32]]:
    lines = out['lines'][0]
    x1, y1 = lines[:, 0, 0], lines[:, 0, 1]
    x2, y2 = lines[:, 1, 0], lines[:, 1, 1]

    vertical_mask = x2 == x1
    slopes = np.where(vertical_mask, np.Infinity, (y2 - y1) / (x2 - x1))
    horizontal_mask = np.abs(slopes) <= 1

    vertical_lines = lines[vertical_mask | ~horizontal_mask]
    horizontal_lines = lines[horizontal_mask]

    return horizontal_lines, vertical_lines

以上是依照斜率區分水平與垂直的 python 實作函式，將 DeepLSD 輸出的字典傳入便可獲得水平與垂直的 NDArray 元組。但其實 DeepLSD 輸出的字典裡有個 key 裝著也能區分水平與垂直的資料叫做 line_level，它將圖上的每個像素座標都賦予一個 0~π 的值，視覺化之後如下:
line_lv

從這張 your_logo_here.jpg 的 line_level 可以看出越是水平的地方其值越接近 0 或 π，越是垂直的地方則反之，利用此特性也能區分水平與垂直:

def classify_lines_with_level(out: dict, tolerance=0.25) -> Tuple[NDArray[np.float32], NDArray[np.float32]]:
    lines = out['lines'][0]
    line_level = out["line_level"].squeeze().cpu()

    lower_bound = 0.0
    upper_bound = np.pi  # max of line_level is π
    atol = (upper_bound - lower_bound) * tolerance
    mid_coords = (lines[:, 0, :] + lines[:, 1, :]) / 2
    mid_coords = mid_coords.astype(int)

    mid_lvls = line_level[mid_coords[:, 1], mid_coords[:, 0]]

    horizontal_mask = np.isclose(mid_lvls, lower_bound, atol=atol) | np.isclose(mid_lvls, upper_bound, atol=atol)
    horizontal_lines = lines[horizontal_mask]
    vertical_lines = lines[~horizontal_mask]

    return horizontal_lines, vertical_lines

以上是依照 line_level 區分水平與垂直的 python 實作函式，將 DeepLSD 輸出的字典傳入亦可獲得水平與垂直的 NDArray 元組 (tolerance 是用來設定說多靠近 0 或 π 才要被設定為水平的)。不過呢，經過測試後發現這兩種方法中依照斜率區分水平與垂直的準確度較高且速度也較快，因此不推薦使用 line_level 做水平與垂直的區分。
result